Some housekeeping (again), installing necessary packages.

list.of.packages <- c("igraph", "tidygraph", "ggraph")

new.packages <- list.of.packages[!(list.of.packages %in% installed.packages()[,"Package"])]
if(length(new.packages)) install.packages(new.packages)
rm(list.of.packages, new.packages)

1 Introduction

Again, the executable kaggle notebook can be found here.

1.1 Reminder: What’s the deal about networks

  • Looking at systems as a network often brings a fresh perspective
  • It’s a completely different data-concept
  • By definition, things are relational, so they somewhat depend on each others (vs. independent observations in a dataframe)
  • It’s kind of messy
  • But it makes pretty and impressive graphs
  • It has its own rules, semantics, indicators, algorithms

1.2 The Data Structure of Graphs

So, briefly back to the questions: Why all the fuzz with graph objects and the like?

  • Tabular data
    • In tabular data, summary statistics of variables are between observations (column-wise) interdependent, meaning changing a value of some observation will change the corresponding variables summary statistics.
    • LIkewise, variable values might be within observation interdependent (row-wise), meaning changing a variable value might change summary statistics of the observation
    • Otherwise, values are (at least mathematically) independent.
  • Graph data
    • Same holds true, but adittional interdependencies due to the relational structure of the data.
    • Sepperation between node and edge data, which is interdependent. Removing a node might alos impy the removal of edges, removal of edges changes the characteristics of nodes
    • In adittion, the relational structure makes that not only true for adjacent nodes and edges, but potentially multiple. Adding/Removing one node/edge could change the characteristics of every single other node/edge.
    • That is less of a problem for local network characteristics (eg., a node’s degree on level 1). However, many node and edge characteristics such
    • That’s mainly why graph computing is slightly more messy, and need own mathematical tools, and applications from graphical computing (graphical like graph, not like figure)

2 Network Structures II

2.1 Introduction to the case

Emmanuel Lazega, The Collegial Phenomenon: The Social Mechanisms of Cooperation Among Peers in a Corporate Law Partnership, Oxford University Press (2001).

This data set comes from a network study of corporate law partnership that was carried out in a Northeastern US corporate law firm, referred to as SG&R, 1988-1991 in New England. It includes (among others) measurements of networks among the 71 attorneys (partners and associates) of this firm, i.e. their strong-coworker network, advice network, friendship network, and indirect control networks. Various members’ attributes are also part of the dataset, including seniority, formal status, office in which they work, gender, lawschool attended, individual performance measurements (hours worked, fees brought in), attitudes concerning various management policy options, etc. This dataset was used to identify social processes such as bounded solidarity, lateral control, quality control, knowledge sharing, balancing powers, regulation, etc. among peers.

The setting:

  • What do corporate lawyers do? Litigation and corporate work.
  • Division of work and interdependencies.
  • Three offices, no departments, built-in pressures to grow, intake and assignment rules.
  • Partners and associates: hierarchy, up or out rule, billing targets.
  • Partnership agreement (sharing benefits equally, 90% exclusion rule, governance structure, elusive committee system) and incompleteness of the contracts.
  • Informal, unwritten rules (ex: no moonlighting, no investment in buildings, no nepotism, no borrowing to pay partners, etc.).
  • Huge incentives to behave opportunistically ; thus the dataset is appropriate for the study of social processes that make cooperation among rival partners possible.
  • Sociometric name generators used to elicit coworkers, advice, and ‘friendship’ ties at SG&R:“Here is the list of all the members of your Firm.”

The networks where created according to the follwoing questionaire:

  • Strong coworkers network: “Because most firms like yours are also organized very informally, it is difficult to get a clear idea of how the members really work together. Think back over the past year, consider all the lawyers in your Firm. Would you go through this list and check the names of those with whom you have worked with. By”worked with" I mean that you have spent time together on at least one case, that you have been assigned to the same case, that they read or used your work product or that you have read or used their work product; this includes professional work done within the Firm like Bar association work, administration, etc."
  • Basic advice network: “Think back over the past year, consider all the lawyers in your Firm. To whom did you go for basic professional advice? For instance, you want to make sure that you are handling a case right, making a proper decision, and you want to consult someone whose professional opinions are in general of great value to you. By advice I do not mean simply technical advice.”
  • ‘Friendship’ network: “Would you go through this list, and check the names of those you socialize with outside work. You know their family, they know yours, for instance. I do not mean all the people you are simply on a friendly level with, or people you happen to meet at Firm functions.”

2.2 Preperation

2.2.1 Load the data

mat.friendship <- fread("../input/socio/ELfriend.dat", data.table = FALSE) %>% as.matrix()
mat.advice <- fread("../input/socio/ELadv.dat", data.table = FALSE) %>% as.matrix()
mat.work <- fread("../input/socio/ELwork.dat", data.table = FALSE) %>% as.matrix()
attributes <- fread("../input/socio/ELattr.dat", data.table = FALSE) 

The three networks refer to cowork, friendship, and advice. The first 36 respondents are the partners in the firm.

2.2.2 Cleaning up

The attribute variables in the file ELattr.dat are:

  • seniority status (1=partner; 2=associate)
  • gender (1=man; 2=woman)
  • office (1=Boston; 2=Hartford; 3=Providence)
  • years with the firm
  • age
  • practice (1=litigation; 2=corporate)
  • law school (1: harvard, yale; 2: ucon; 3: other)
colnames(attributes) <- c("id", "seniority", "gender", "office", "tenure", "age", "practice", "school")
   
attributes %<>%
  mutate(seniority = recode(seniority, "1" = "Partner", "2" = "Associate"),
         gender = recode(gender, "1" = "Man", "2" = "Woman"),
         office = recode(office, "1" = "Boston", "2" = "Hartford", "3" = "Providence"),
         practice = recode(practice, "1" = "Litigation", "2" = "Corporate"),
         school = recode(school, "1" = "Harvard, Yale", "2" = "Ucon", "3" = "Others"))   
    
head(attributes)
skim(attributes)

2.2.3 Generate the graph

require(igraph)
g <- graph_from_adjacency_matrix(mat.advice, 
                                 mode = "directed")
g$type <- "friendship"

V(g)$name <- 1:vcount(g)
V(g)$gender <- attributes$gender
V(g)$office <- attributes$office
V(g)$age <- attributes$age
V(g)$tenure <- attributes$tenure
V(g)$practice <- attributes$practice
V(g)$school <- attributes$school
E(g)$type <- "friendship"
g 
## IGRAPH 084e483 DN-- 71 892 -- 
## + attr: type (g/c), name (v/c), gender (v/c), office (v/c), age
## | (v/n), tenure (v/n), practice (v/c), school (v/c), type (e/c)
## + edges from 084e483 (vertex names):
##  [1] 1 ->2  1 ->17 1 ->20 2 ->1  2 ->6  2 ->17 2 ->20 2 ->22 2 ->24 2 ->26
## [11] 3 ->2  3 ->6  3 ->14 3 ->17 3 ->18 3 ->28 3 ->30 4 ->1  4 ->2  4 ->3 
## [21] 4 ->6  4 ->9  4 ->12 4 ->13 4 ->14 4 ->17 4 ->19 4 ->20 4 ->21 4 ->22
## [31] 4 ->24 4 ->26 4 ->28 4 ->29 5 ->1  5 ->6  5 ->11 5 ->18 7 ->2  7 ->10
## [41] 7 ->28 7 ->34 8 ->1  8 ->11 9 ->11 9 ->12 9 ->17 10->2  10->16 10->17
## [51] 10->24 10->26 10->29 10->34 11->1  11->8  11->12 11->17 11->21 12->2 
## [61] 12->4  12->8  12->9  12->10 12->11 12->13 12->14 12->15 12->16 12->17
## + ... omitted several edges

Lets take a look

plot(g,
     layout= layout_with_fr(g),
     vertex.label = degree(g, mode = "in"),
     vertex.size = 1+ sqrt(degree(g, mode = "in")),
     vertex.color = as.numeric(factor(V(g)$office)),
     vertex.label = NA,
     edge.arrow.size = 0.5)

2.3 Network effects & structures

2.3.1 Node level (local)

We could look at all the node level characteristics (degree, betweenness etc.) again, but for the sake of time I skip that for now, since its all already in the last notebook.

2.3.2 Network level (global)

Ok, lets do the whole exercise with getting the main-determinants of the network structure again. We can look at the classical structural determinants.

# Get density of a graph
edge_density(g)
## [1] 0.1794769
# Get the diameter of the graph g
diameter(g, directed = TRUE)
## [1] 6
# Get the average path length of the graph g
mean_distance(g, directed = TRUE)
## [1] 2.243267
# Transistivity
transitivity(g, type ="global")
## [1] 0.4787826
# recipricity
reciprocity(g)
## [1] 0.3923767

The last two ones make even mores sense in social networks, don’t they?

We have another important concept that often explains edge-formation: Assortativity, anso called homopholy. This is a measure of how preferentially attached vertices are to other vertices with identical attributes. In other words: How much “birds of the same feather flock together ”.

Lets first look at people of the same tenure flock together.

assortativity(g, V(g)$tenure, directed = TRUE)
## [1] 0.294341

What about people from elite universities?

assortativity(g, V(g)$school == "Harvard, Yale", directed = TRUE)
## [1] 0.07980417

Lastly, what about the popularity (or “Matthew”) effect?

assortativity(g, degree(g, mode = "in"), directed = TRUE)
## [1] 0.1869548

Also not that much…..

One more thing we didn’t talk about yet: Small worlds.

Small worlds are an interesting network structure, combining short path lenght betwen the nodes with a high clustering coefficient. That means, that we have small interconected clusters, which are in turn connected by gatekeepers (the edges we call bridges or structural holes).

This leads to an interesting setup, which has proven to be conductive for efficient communication and fast diffusion of information in social networks.

We calculate it for now in an easy way:

transitivity(g, type ="global") / mean_distance(g, directed = TRUE)
## [1] 0.213431

However, you by now probably wonder how to interprete this numbers. Are they high, low, or whatever? What is the reference? In fact, it’s very hard to say. The best way to say something about that is to compare it with what a random network would look like.

So, lets create a random network. Here, we use the erdos.renyi.game() function, which creates a network with a given number of nodes and edge-density, but where the edges are constructed completely random.

g.r <- erdos.renyi.game(n = gorder(g), 
                        p.or.m = gsize(g), 
                        type = "gnm",
                        directed = TRUE,
                        loops = FALSE)
plot(g.r)

Looks kind of different. However, one randomly created network doesn’t present a good abseline. So, lets better create a bunch, and compare our network to the average values of the randomly generated ones.

# Generate 1000 random graphs
g.l <- vector('list',1000)
  
for(i in 1:1000){
  g.l[[i]] <- erdos.renyi.game(n = gorder(g), 
                        p.or.m = gsize(g), 
                        type = "gnm",
                        directed = TRUE,
                        loops = FALSE)
}

Now we can see how meaningful our observed network statistics are, by comparing them with the mean of the statistics in the random network.

# Calculate average path length of 1000 random graphs
dist.r <- mean(unlist(lapply(g.l, mean_distance, directed = TRUE)))
cc.r <- mean(unlist(lapply(g.l, transitivity, type ="global")))
rp.r <- mean(unlist(lapply(g.l, reciprocity)))

Lets see:

stats.friend <- tibble(density = edge_density(g),
                       diameter = diameter(g, directed = TRUE),
                       reciprocity = reciprocity(g),
                       reciprocity.score = mean(reciprocity(g) > rp.r),
                       distance = mean_distance(g, directed = TRUE),
                       distance.score = mean(mean_distance(g, directed = TRUE) > dist.r),
                       clustering = transitivity(g, type ="global"),
                       clustering.score = mean(transitivity(g, type ="global")  > cc.r),
                       small.world = mean(transitivity(g, type ="global")  > cc.r) / mean(mean_distance(g, directed = TRUE) > dist.r) )

stats.friend

2.4 Your turn

Its time for you to explore the data a bit on your own. You can find it here.

3 Multi-Modal Networks

Now its time to talk about an interesting type of networks, multi-modal. This means, a network has several “modes”, meaning connects entities on different conceptual levels. The most commone one is a 2-mode (or bipartite) network. Examples could be an Author \(\rightarrow\) Paper, Inventor \(\rightarrow\) Patent, Member \(\rightarrow\) Club network. Here, the elements in the different modes represent different things.

We can alalyse them in sepperation (and sometimes we should), but often its helpful to “project”" them onto one mode. Here, we create a node in one mode by joint association with another mode.

While that sounds simple, it can be a very powerful technique, as I will demonstrate now.

4 Case study: Bibliographic networks

4.1 Basics

Lets talk about bibliographic networks. In short, that are networks between documents which cite each others. That can be (commonly) academic publications, but also patents or policy reports. Conceptually, we can see them as 2 mode networks, between articles and their reference. That helps us to apply some interesting metrics, such as:

  • direct citations
  • Bibliographic coupling
  • Co–citations

Interestingly, different projections of this 2-mode network give the whole resulting 1-mode network a different meaning.

I will illustrate more in detail in the following.

4.2 Doing it by hand

Lets imagine we do it the hard way. We download some bibliographic data, and have to do all the munging on our own, till we end up with a nice network representation. Lets go through some of these steps together.

The example is absed on some own work, which is currently under revision, but made available for you:

Rakas, M and Hain, D, (under revision), “Innovation System Research: Where It Came From, and What It Is Now”

Lets get started. I will load some bibliographic data (selection process explained in the paper) on articles concerned with the field of “Innovation Studies”. It already went through some upfront cleaning, but is very similar to what you get when you download data from WoS.

rm(list=ls())

articles <- readRDS("../input/biblio/publications.RDS")

articles %<>%
  select(SR, AU, TI, JI, PY, AU_UN, DE, TC, NR, CR) %>%
  rename(article = SR,
         author = AU,
         title = TI,
         journal = JI,
         year = PY,
         affiliation = AU_UN,
         keywords = DE,
         citations = TC,
         references = NR,
         reference.list = CR)

articles %>%
  arrange(desc(citations)) %>%
  head(20)

So, where are the links to the references? Its a bit messy, they are all found in the CRF field, sepperated by ;.

articles[1, "reference.list"]

I will now transfere them to an article \(\rightarrow\) reference edgelist. Since its a lot of data, I will here use the data.table package functionality. I usually avoid it, because I ahte the syntax. However, its just way faster.

citation.el <- data.table(article = articles$article, 
                          str_split_fixed(articles$reference.list, ";", max(articles$references, na.rm=T))) 

citation.el <- melt(citation.el, id.vars = "article")[, variable:= NULL][value!=""]

citation.el %<>%
  rename(reference = value) %>%
  arrange(article,reference)
head(citation.el)

Likewise, I will transfer this into a sparse 2-mode matrix. I amke it sparse because its way more efficient.

library(Matrix)
mat <- spMatrix(nrow=length(unique(citation.el$article)),
                ncol=length(unique(citation.el$reference)),
                i = as.numeric(factor(citation.el$article)),
                j = as.numeric(factor(citation.el$reference)),
                x = rep(1, length(as.numeric(citation.el$article))) ) 
row.names(mat) <- levels(factor(citation.el$article))
colnames(mat) <- levels(factor(citation.el$reference))

str(mat)
## Formal class 'dgTMatrix' [package "Matrix"] with 6 slots
##   ..@ i       : int [1:244252] 0 0 0 0 0 0 0 0 0 0 ...
##   ..@ j       : int [1:244252] 10526 14911 14934 15002 15291 17906 19745 20899 23183 23860 ...
##   ..@ Dim     : int [1:2] 6370 36611
##   ..@ Dimnames:List of 2
##   .. ..$ : chr [1:6370] "(HANS) DE HAAN J, 2011, TECHNOL FORECAST SOC CHANG" "AARSTAD J, 2016, RES POLICY" "ABDI M, 2012, J INT BUS STUD" "ABDIH Y, 2006, IMF STAFF PAP" ...
##   .. ..$ : chr [1:36611] "A D 1994 POSTBUREAUCRATIC ORG" "A W 1998 MANAGING TOTAL QUALI" "AAGE T 2004 DAN RES UN IND DYN D" "AAGE T 2006 THESIS COPENHAGEN BU" ...
##   ..@ x       : num [1:244252] 1 1 1 1 1 1 1 1 1 1 ...
##   ..@ factors : list()

Here again, I use a efficient way to create the 1-mode projection. This is done by taking the matrix, and taking the dotproduct of its pransposed version (m %*% t(m)). For the one that still remember some matrix algebra, that will sound familiar.

mat.art <- tcrossprod(mat)
mat.ref <- tcrossprod(t(mat))
rm(mat)

So far so good, lets put it in a graph. I also set the attributes right away.

require(igraph)
g <- graph_from_adjacency_matrix(mat.art, 
                                 mode = "undirected", 
                                 weighted = T, 
                                 diag = F) ; rm(mat.art)

g <- simplify(g, 
              remove.multiple = T, 
              remove.loops = T, 
              edge.attr.comb = "sum")

temp <- tibble(article = V(g)$name) %>%
  left_join(articles %>% select(article, year, citations, references), by = "article")

g <- set_vertex_attr(g, "year", value = temp$year)
g <- set_vertex_attr(g, "citations", value = temp$citations)
g <- set_vertex_attr(g, "references", value = temp$references)
rm(temp)

g
## IGRAPH 106adcb UNW- 6370 3801377 -- 
## + attr: name (v/c), year (v/n), citations (v/n), references (v/n),
## | weight (e/n)
## + edges from 106adcb (vertex names):
## [1] (HANS) DE HAAN J, 2011, TECHNOL FORECAST SOC CHANG--ABDI M, 2012, J INT BUS STUD  
## [2] (HANS) DE HAAN J, 2011, TECHNOL FORECAST SOC CHANG--ABEBE GK, 2013, AGRIC SYST    
## [3] (HANS) DE HAAN J, 2011, TECHNOL FORECAST SOC CHANG--ACS ZJ, 2014, RES POLICY      
## [4] (HANS) DE HAAN J, 2011, TECHNOL FORECAST SOC CHANG--ADJEI-NSIAH S, 2016, CAH AGRIC
## [5] (HANS) DE HAAN J, 2011, TECHNOL FORECAST SOC CHANG--ADNER R, 2001, MANAGE SCI     
## + ... omitted several edges

I will now do some jaccard weighting on the edges, to get a nicer distribution.

E(g)$weight.count <- E(g)$weight
i <- V(g)[get.edges(g, E(g))[,1]]$references # degree of node V1 of every edge
j <- V(g)[get.edges(g, E(g))[,2]]$references # degree of node V2 of every edge
E(g)$weight <- E(g)$weight.count / (i + j - E(g)$weight.count) 
rm(i, j)

And delete the weak edges, to create more sparsity.

g <- delete.edges(g, E(g)[weight < quantile(weight, 0.1, na.rm = T)]) 
g <- delete.vertices(g, strength(g) == 0)
g <- delete.vertices(g, strength(g) < quantile(strength(g), 0.25, na.rm = T) )
g
## IGRAPH 482f5c3 UNW- 4777 3015453 -- 
## + attr: name (v/c), year (v/n), citations (v/n), references (v/n),
## | weight (e/n), weight.count (e/n)
## + edges from 482f5c3 (vertex names):
## [1] (HANS) DE HAAN J, 2011, TECHNOL FORECAST SOC CHANG--ACS ZJ, 2014, RES POLICY       
## [2] (HANS) DE HAAN J, 2011, TECHNOL FORECAST SOC CHANG--ADJEI-NSIAH S, 2016, CAH AGRIC 
## [3] (HANS) DE HAAN J, 2011, TECHNOL FORECAST SOC CHANG--ADNER R, 2001, MANAGE SCI      
## [4] (HANS) DE HAAN J, 2011, TECHNOL FORECAST SOC CHANG--ADNER R, 2002, STRATEG MANAGE J
## [5] (HANS) DE HAAN J, 2011, TECHNOL FORECAST SOC CHANG--ADNER R, 2016, STRATEG MANAGE J
## + ... omitted several edges

La voila, we can start the analysis. However, the rest you by now know, so I will skip that for now. Instead, I will show you how to do that all way more convenient.

4.3 Fun with the bibliometrix package

Since lately, the bibliometrix package became exteremly good, and by now almost suitable to replace my hand-made workflows. So, I will spare you the data munging, and demonstrate how to use the nice inbuild functionalities here. By doing so, you will develop a lot of intuition on network projection, and aggregation on different levels.

rm(list = ls())
require(bibliometrix)
?bibliometrix

4.3.1 Loading the data

So, lets load some data. We could go on with my “Innovation System” data, but I have a better idea. Since it appeared appropriate, I went to Web of Science, and downloaded the most cited paper which ahve “Network Analysis” in their title, abstract. or keywords."

  • Data source: Clarivate Analytics Web of Science (http://apps.webofknowledge.com)
  • Data format: Plaintext
  • Query: Topic = “Network Analysis”
  • Timespan: 2008-2018
  • Document Type: Articles
  • Language: English
  • Query data: October, 2018
  • Selection: 500 most cited

We now just read the plain data with the inbuild convert2df() function

M <- convert2df(readFiles("../input/biblio/biblio_nw1.txt"), 
                dbsource = "isi",
                format = "plaintext")
## 
## Converting your isi collection into a bibliographic dataframe
## 
## Articles extracted   100 
## Articles extracted   200 
## Articles extracted   300 
## Articles extracted   400 
## Articles extracted   500 
## Done!
## 
## 
## Generating affiliation field tag AU_UN from C1:  Done!
head(M)

4.3.2 Descriptive Analysis

Although bibliometrics is mainly known for quantifying the scientific production and measuring its quality and impact, it is also useful for displaying and analysing the intellectual, conceptual and social structures of research as well as their evolution and dynamical aspects.

In this way, bibliometrics aims to describe how specific disciplines, scientific domains, or research fields are structured and how they evolve over time. In other words, bibliometric methods help to map the science (so-called science mapping) and are very useful in the case of research synthesis, especially for the systematic ones.

Bibliometrics is an academic science founded on a set of statistical methods, which can be used to analyze scientific big data quantitatively and their evolution over time and discover information. Network structure is often used to model the interaction among authors, papers/documents/articles, references, keywords, etc.

Bibliometrix is an open-source software for automating the stages of data-analysis and data-visualization. After converting and uploading bibliographic data in R, Bibliometrix performs a descriptive analysis and different research-structure analysis.

Descriptive analysis provides some snapshots about the annual research development, the top “k” productive authors, papers, countries and most relevant keywords.

4.3.2.1 Main findings about the collection

results <- biblioAnalysis(M)
summary(results, 
        k = 20, 
        pause = F)
## 
## 
## Main Information about data
## 
##  Documents                             500 
##  Sources (Journals, Books, etc.)       268 
##  Keywords Plus (ID)                    2490 
##  Author's Keywords (DE)                1206 
##  Period                                2008 - 2016 
##  Average citations per documents       150.6 
## 
##  Authors                               3562 
##  Author Appearances                    3889 
##  Authors of single authored documents  17 
##  Authors of multi authored documents   3545 
## 
##  Documents per Author                  0.14 
##  Authors per Document                  7.12 
##  Co-Authors per Documents              7.78 
##  Collaboration Index                   7.51 
##  
##  Document types                     
##  J                                     496 
##  S                                     4 
##  
## 
## Annual Scientific Production
## 
##  Year    Articles
##     2008       65
##     2009       92
##     2010       83
##     2011       79
##     2012       66
##     2013       38
##     2014       40
##     2015       27
##     2016       10
## 
## Annual Percentage Growth Rate -20.86186 
## 
## 
## Most Productive Authors
## 
##    Authors        Articles Authors        Articles Fractionalized
## 1   HORVATH S           20  HORVATH S                        3.88
## 2   GESCHWIND DH        12  LEYDESDORFF L                    2.33
## 3   LANGFELDER P         8  DEARING JW                       2.00
## 4   MILLER JA            7  LANGFELDER P                     1.92
## 5   HE Y                 6  GESCHWIND DH                     1.66
## 6   BORSBOOM D           5  BODIN O                          1.50
## 7   COPPOLA G            5  BOSCHMA R                        1.50
## 8   ZHANG B              5  DAWSON S                         1.50
## 9   BASSETT DS           4  DING Y                           1.50
## 10  BULLMORE ET          4  ERNSTSON H                       1.33
## 11  CHO JH               4  INGOLD K                         1.33
## 12  GAO FY               4  JORDAN F                         1.25
## 13  KNIGHT R             4  BRANDES U                        1.17
## 14  LEYDESDORFF L        4  BLUTHGEN N                       1.14
## 15  MENON V              4  BORSBOOM D                       1.13
## 16  MILL J               4  MILLER JA                        1.13
## 17  OLDHAM MC            4  SCHENSUL JJ                      1.09
## 18  OPHOFF RA            4  MENON V                          1.07
## 19  SAITO K              4  HE Y                             1.06
## 20  SMITH SM             4  ASHTON W                         1.00
## 
## 
## Top manuscripts per citations
## 
##                            Paper            TC TCperYear
## 1  RUBINOV M, 2010, NEUROIMAGE            2848     356.0
## 2  LANGFELDER P, 2008, BMC BIOINFORMATICS 2152     215.2
## 3  SMITH SM, 2009, P NATL ACAD SCI USA    2004     222.7
## 4  JOSTINS L, 2012, NATURE                1790     298.3
## 5  BUCKNER RL, 2009, J NEUROSCI           1274     141.6
## 6  VOINEAGU I, 2011, NATURE                752     107.4
## 7  DELOUKAS P, 2013, NAT GENET             703     140.6
## 8  EAGLE N, 2009, P NATL ACAD SCI USA      682      75.8
## 9  CHEN J, 2009, NUCLEIC ACIDS RES         672      74.7
## 10 THIELE I, 2010, NAT PROTOC              601      75.1
## 11 FRANSSON P, 2008, NEUROIMAGE            572      57.2
## 12 SUPEKAR K, 2008, PLOS COMPUT BIOL       539      53.9
## 13 XUE J, 2014, IMMUNITY                   531     132.8
## 14 FOWLER JH, 2008, BRIT MED J             503      50.3
## 15 MILL J, 2008, AM J HUM GENET            480      48.0
## 16 BAILEY P, 2016, NATURE                  452     226.0
## 17 AIROLDI EM, 2008, J MACH LEARN RES      443      44.3
## 18 SUPEKAR K, 2009, PLOS BIOL              413      45.9
## 19 BARBERAN A, 2012, ISME J                383      63.8
## 20 GARDY JL, 2011, NEW ENGL J MED          369      52.7
## 
## 
## Most Productive Countries (of corresponding authors)
## 
##         Country   Articles    Freq SCP MCP MCP_Ratio
## 1  USA                 228 0.45691 159  69     0.303
## 2  CHINA                35 0.07014  18  17     0.486
## 3  UNITED KINGDOM       34 0.06814  16  18     0.529
## 4  NETHERLANDS          27 0.05411  17  10     0.370
## 5  GERMANY              26 0.05210  14  12     0.462
## 6  CANADA               20 0.04008   9  11     0.550
## 7  ITALY                18 0.03607   7  11     0.611
## 8  AUSTRALIA            16 0.03206   6  10     0.625
## 9  SPAIN                11 0.02204   3   8     0.727
## 10 SWEDEN               11 0.02204   6   5     0.455
## 11 SWITZERLAND          10 0.02004   6   4     0.400
## 12 FRANCE                7 0.01403   4   3     0.429
## 13 KOREA                 7 0.01403   4   3     0.429
## 14 JAPAN                 6 0.01202   6   0     0.000
## 15 BELGIUM               5 0.01002   1   4     0.800
## 16 AUSTRIA               4 0.00802   2   2     0.500
## 17 IRELAND               4 0.00802   2   2     0.500
## 18 FINLAND               3 0.00601   1   2     0.667
## 19 GEORGIA               3 0.00601   3   0     0.000
## 20 BRAZIL                2 0.00401   0   2     1.000
## 
## 
## SCP: Single Country Publications
## 
## MCP: Multiple Country Publications
## 
## 
## Total Citations per Country
## 
##      Country      Total Citations Average Article Citations
## 1  USA                      39031                     171.2
## 2  UNITED KINGDOM            7023                     206.6
## 3  CHINA                     3819                     109.1
## 4  CANADA                    3440                     172.0
## 5  GERMANY                   3344                     128.6
## 6  NETHERLANDS               3132                     116.0
## 7  AUSTRALIA                 2128                     133.0
## 8  ITALY                     2046                     113.7
## 9  SWEDEN                    1502                     136.5
## 10 SPAIN                     1265                     115.0
## 11 SWITZERLAND               1141                     114.1
## 12 JAPAN                     1002                     167.0
## 13 FRANCE                     801                     114.4
## 14 KOREA                      735                     105.0
## 15 IRELAND                    650                     162.5
## 16 AUSTRIA                    540                     135.0
## 17 GEORGIA                    429                     143.0
## 18 BELGIUM                    389                      77.8
## 19 GREECE                     384                     192.0
## 20 FINLAND                    324                     108.0
## 
## 
## Most Relevant Sources
## 
##                                                                     Sources        Articles
## 1  PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA       25
## 2  PLOS ONE                                                                              22
## 3  NEUROIMAGE                                                                            15
## 4  NATURE                                                                                10
## 5  ISME JOURNAL                                                                           9
## 6  NUCLEIC ACIDS RESEARCH                                                                 9
## 7  CELL                                                                                   7
## 8  GENOME RESEARCH                                                                        7
## 9  BIOINFORMATICS                                                                         6
## 10 BMC BIOINFORMATICS                                                                     6
## 11 PLOS GENETICS                                                                          6
## 12 BRAIN                                                                                  5
## 13 CANCER RESEARCH                                                                        5
## 14 JOURNAL OF INFORMETRICS                                                                5
## 15 MOLECULAR SYSTEMS BIOLOGY                                                              5
## 16 BMC GENOMICS                                                                           4
## 17 DECISION SUPPORT SYSTEMS                                                               4
## 18 EXPERT SYSTEMS WITH APPLICATIONS                                                       4
## 19 JOURNAL OF NEUROSCIENCE                                                                4
## 20 LANDSCAPE AND URBAN PLANNING                                                           4
## 
## 
## Most Relevant Keywords
## 
##    Author Keywords (DE)      Articles  Keywords-Plus (ID)     Articles
## 1   SOCIAL NETWORK ANALYSIS        43 NETWORK ANALYSIS              41
## 2   NETWORK ANALYSIS               41 EXPRESSION                    32
## 3   GRAPH THEORY                   14 GENE-EXPRESSION               29
## 4   SOCIAL NETWORKS                13 NETWORKS                      26
## 5   SYSTEMS BIOLOGY                10 ORGANIZATION                  25
## 6   FUNCTIONAL CONNECTIVITY         9 IDENTIFICATION                24
## 7   CONNECTIVITY                    7 COMPLEX NETWORKS              22
## 8   FMRI                            7 CENTRALITY                    21
## 9   NETWORK                         7 DISEASE                       21
## 10  CENTRALITY                      6 DYNAMICS                      20
## 11  TRACTOGRAPHY                    6 PATTERNS                      17
## 12  CLUSTERING                      5 ALZHEIMERS-DISEASE            16
## 13  MICROARRAY                      5 EVOLUTION                     16
## 14  NETWORKS                        5 MODEL                         16
## 15  COMMUNITY                       4 COMMUNITY STRUCTURE           15
## 16  COMPLEX NETWORKS                4 ESCHERICHIA-COLI              15
## 17  DIFFUSION TENSOR IMAGING        4 FUNCTIONAL CONNECTIVITY       15
## 18  GENE EXPRESSION                 4 PERFORMANCE                   15
## 19  METABOLOMICS                    4 BEHAVIOR                      14
## 20  MICRORNA                        4 MASS-SPECTROMETRY             14
plot(results)

4.3.2.2 Most Cited References (internally)

CR <- citations(M, 
                field = "article", 
                sep = ";")
cbind(CR$Cited[1:10])
##                                                                           [,1]
## WASSERMAN S, 1994, SOCIAL NETWORK ANAL                                      63
## WATTS DJ, 1998, NATURE, V393, P440, DOI 101038/30918                        49
## ZHANG B, 2005, STAT APPL GENET MO B, V4, DOI 102202/1544-61151128           47
## FREEMAN LC, 1979, SOC NETWORKS, V1, P215, DOI 101016/0378-8733(78)90021-7   42
## LANGFELDER P, 2008, BMC BIOINFORMATICS, V9, DOI 101186/1471-2105-9-559      37
## SHANNON P, 2003, GENOME RES, V13, P2498, DOI 101101/GR1239303               29
## OLDHAM MC, 2008, NAT NEUROSCI, V11, P1271, DOI 101038/NN2207                27
## FREEMAN LC, 1977, SOCIOMETRY, V40, P35, DOI 102307/3033543                  26
## NEWMAN MEJ, 2003, SIAM REV, V45, P167, DOI 101137/S003614450342480          26
## GRANOVETTER MS, 1973, AM J SOCIOL, V78, P1360, DOI 101086/225469            25

4.3.3 Bibliographic Copling Analysis: The Knowledge Frontier of the Field

Bibliographic coupling is a newer technique, which has turned out to be very appropriate to capture a fields current knowledge frontier. I will show you how to do it here, but in case you are interested, read my paper :)

NetMatrix <- biblioNetwork(M, 
                           analysis = "coupling", 
                           network = "references", 
                           sep = ";")

net <-networkPlot(NetMatrix, 
            n = 50, 
            Title = "Bibliographic Coupling Network", 
            type = "fruchterman", 
            size.cex = TRUE, 
            size = 20, 
            remove.multiple = FALSE, 
            labelsize = 0.7,
            edgesize = 10, 
            edges.min = 5)

4.3.4 Co-citation Analysis: The Intellectual Structure and Knowledge Bases of the field

Citation analysis is one of the main classic techniques in bibliometrics. It shows the structure of a specific field through the linkages between nodes (e.g. authors, papers, journal), while the edges can be differently interpretated depending on the network type, that are namely co-citation, direct citation, bibliographic coupling.

Below there are three examples.

  • First, a co-citation network that shows relations between cited-reference works (nodes).
  • Second, a co-citation network that uses cited-journals as unit of analysis. The useful dimensions to comment the co-citation networks are: (i) centrality and peripherality of nodes, (ii) their proximity and distance, (iii) strength of ties, (iv) clusters, (iiv) bridging contributions.
  • Third, a historiograph is built on direct citations. It draws the intellectual linkages in a historical order. Cited works of thousands of authors contained in a collection of published scientific articles is sufficient for recostructing the historiographic structure of the field, calling out the basic works in it.

4.3.4.1 Co-citation (cited references) analysis

Plot options:

  • n = 50 (the funxtion plots the main 50 cited references)
  • type = “fruchterman” (the network layout is generated using the Fruchterman-Reingold Algorithm)
  • size.cex = TRUE (the size of the vertices is proportional to their degree)
  • size = 20 (the max size of vertices)
  • remove.multiple=FALSE (multiple edges are not removed)
  • labelsize = 0.7 (defines the size of vertex labels)
  • edgesize = 10 (The thickness of the edges is proportional to their strength. Edgesize defines the max value of the thickness)
  • edges.min = 5 (plots only edges with a strength greater than or equal to 5)
  • all other arguments assume the default values
NetMatrix <- biblioNetwork(M, 
                           analysis = "co-citation", 
                           network = "references", 
                           sep = ";")

net <-networkPlot(NetMatrix, 
            n = 50, 
            Title = "Co-Citation Network", 
            type = "fruchterman", 
            size.cex = TRUE, 
            size = 20, 
            remove.multiple = FALSE, 
            labelsize = 0.7,
            edgesize = 10, 
            edges.min = 5)

4.3.4.2 Cited Journal (Source) co-citation analysis

M <- metaTagExtraction(M, "CR_SO", sep=";")

NetMatrix <- biblioNetwork(M, 
                           analysis = "co-citation", 
                           network = "sources", 
                           sep = ";")

net <-networkPlot(NetMatrix, 
            n = 50, 
            Title = "Co-Citation Network", 
            type = "auto", 
            size.cex = TRUE, 
            size = 15, 
            remove.multiple = FALSE, 
            labelsize = 0.7,
            edgesize = 10, 
            edges.min = 5)

by the way, the results contain an “hidden” igraph obejct. That is new, and makes further analysis of the results possible. Great!

str(net)
## List of 3
##  $ graph      :List of 10
##   ..$ :List of 1
##   .. ..$ J NEUROSCI: 'igraph.vs' Named int [1:936] 2 2 2 2 2 2 2 2 2 2 ...
##   .. .. ..- attr(*, "names")= chr [1:936] "PLOS COMPUT BIOL" "PLOS COMPUT BIOL" "PLOS COMPUT BIOL" "PLOS COMPUT BIOL" ...
##   .. .. ..- attr(*, "env")=<weakref> 
##   .. .. ..- attr(*, "graph")= chr "6bb00e22-c82e-11e8-8918-ab43b0a1c864"
##   ..$ :List of 1
##   .. ..$ PLOS COMPUT BIOL: 'igraph.vs' Named int [1:1154] 1 1 1 1 1 1 1 1 1 1 ...
##   .. .. ..- attr(*, "names")= chr [1:1154] "J NEUROSCI" "J NEUROSCI" "J NEUROSCI" "J NEUROSCI" ...
##   .. .. ..- attr(*, "env")=<weakref> 
##   .. .. ..- attr(*, "graph")= chr "6bb00e22-c82e-11e8-8918-ab43b0a1c864"
##   ..$ :List of 1
##   .. ..$ SCIENCE: 'igraph.vs' Named int [1:2320] 1 1 1 1 1 1 1 1 1 1 ...
##   .. .. ..- attr(*, "names")= chr [1:2320] "J NEUROSCI" "J NEUROSCI" "J NEUROSCI" "J NEUROSCI" ...
##   .. .. ..- attr(*, "env")=<weakref> 
##   .. .. ..- attr(*, "graph")= chr "6bb00e22-c82e-11e8-8918-ab43b0a1c864"
##   ..$ :List of 1
##   .. ..$ P NATL ACAD SCI USA: 'igraph.vs' Named int [1:2642] 1 1 1 1 1 1 1 1 1 1 ...
##   .. .. ..- attr(*, "names")= chr [1:2642] "J NEUROSCI" "J NEUROSCI" "J NEUROSCI" "J NEUROSCI" ...
##   .. .. ..- attr(*, "env")=<weakref> 
##   .. .. ..- attr(*, "graph")= chr "6bb00e22-c82e-11e8-8918-ab43b0a1c864"
##   ..$ :List of 1
##   .. ..$ NAT REV NEUROSCI: 'igraph.vs' Named int [1:572] 1 1 1 1 1 1 1 1 1 1 ...
##   .. .. ..- attr(*, "names")= chr [1:572] "J NEUROSCI" "J NEUROSCI" "J NEUROSCI" "J NEUROSCI" ...
##   .. .. ..- attr(*, "env")=<weakref> 
##   .. .. ..- attr(*, "graph")= chr "6bb00e22-c82e-11e8-8918-ab43b0a1c864"
##   ..$ :List of 1
##   .. ..$ BMC SYST BIOL: 'igraph.vs' Named int [1:715] 1 1 1 1 1 1 1 1 1 2 ...
##   .. .. ..- attr(*, "names")= chr [1:715] "J NEUROSCI" "J NEUROSCI" "J NEUROSCI" "J NEUROSCI" ...
##   .. .. ..- attr(*, "env")=<weakref> 
##   .. .. ..- attr(*, "graph")= chr "6bb00e22-c82e-11e8-8918-ab43b0a1c864"
##   ..$ :List of 1
##   .. ..$ NEUROIMAGE: 'igraph.vs' Named int [1:541] 1 1 1 1 1 1 1 1 1 1 ...
##   .. .. ..- attr(*, "names")= chr [1:541] "J NEUROSCI" "J NEUROSCI" "J NEUROSCI" "J NEUROSCI" ...
##   .. .. ..- attr(*, "env")=<weakref> 
##   .. .. ..- attr(*, "graph")= chr "6bb00e22-c82e-11e8-8918-ab43b0a1c864"
##   ..$ :List of 1
##   .. ..$ PHYS REV E: 'igraph.vs' Named int [1:614] 1 1 1 1 1 1 1 1 1 1 ...
##   .. .. ..- attr(*, "names")= chr [1:614] "J NEUROSCI" "J NEUROSCI" "J NEUROSCI" "J NEUROSCI" ...
##   .. .. ..- attr(*, "env")=<weakref> 
##   .. .. ..- attr(*, "graph")= chr "6bb00e22-c82e-11e8-8918-ab43b0a1c864"
##   ..$ :List of 1
##   .. ..$ SOC NETWORKS: 'igraph.vs' Named int [1:526] 1 1 1 1 1 2 2 2 2 2 ...
##   .. .. ..- attr(*, "names")= chr [1:526] "J NEUROSCI" "J NEUROSCI" "J NEUROSCI" "J NEUROSCI" ...
##   .. .. ..- attr(*, "env")=<weakref> 
##   .. .. ..- attr(*, "graph")= chr "6bb00e22-c82e-11e8-8918-ab43b0a1c864"
##   ..$ :List of 1
##   .. ..$ PLOS BIOL: 'igraph.vs' Named int [1:661] 1 1 1 1 1 1 1 1 1 1 ...
##   .. .. ..- attr(*, "names")= chr [1:661] "J NEUROSCI" "J NEUROSCI" "J NEUROSCI" "J NEUROSCI" ...
##   .. .. ..- attr(*, "env")=<weakref> 
##   .. .. ..- attr(*, "graph")= chr "6bb00e22-c82e-11e8-8918-ab43b0a1c864"
##   ..- attr(*, "class")= chr "igraph"
##  $ cluster_obj:List of 5
##   ..$ merges    : chr [1:4] "SOC NETWORKS" "SOCIAL NETWORK ANAL" "ANNU REV SOCIOL" "AM J SOCIOL"
##   ..$ modularity: chr [1:5] "PHYS REV E" "PHYS REV LETT" "PHYSICA A" "SIAM REV" ...
##   ..$ membership: chr [1:15] "J NEUROSCI" "PLOS COMPUT BIOL" "SCIENCE" "P NATL ACAD SCI USA" ...
##   ..$ names     : chr [1:6] "ADMIN SCI QUART" "ACAD MANAGE J" "MANAGE SCI" "ACAD MANAGE REV" ...
##   ..$ vcount    : chr [1:20] "BMC SYST BIOL" "NAT BIOTECHNOL" "NAT GENET" "BIOINFORMATICS" ...
##   ..- attr(*, "class")= chr "communities"
##  $ cluster_res:'data.frame': 50 obs. of  3 variables:
##   ..$ vertex        : Factor w/ 50 levels "ACAD MANAGE J",..: 47 48 7 5 38 39 40 46 49 20 ...
##   ..$ cluster       : num [1:50] 1 1 1 1 2 2 2 2 2 3 ...
##   ..$ btw_centrality: num [1:50] 8.436 2.965 0.443 4.139 4.116 ...
net$graph
## IGRAPH 6bb00e2 UN-- 50 18378 -- 
## + attr: name (v/c), deg (v/n), size (v/n), label.cex (v/n), color (v/c), community (v/n), color (e/c), num
## | (e/n), width (e/n)
## + edges from 6bb00e2 (vertex names):
##  [1] J NEUROSCI--PLOS COMPUT BIOL J NEUROSCI--PLOS COMPUT BIOL J NEUROSCI--PLOS COMPUT BIOL J NEUROSCI--PLOS COMPUT BIOL
##  [5] J NEUROSCI--PLOS COMPUT BIOL J NEUROSCI--PLOS COMPUT BIOL J NEUROSCI--PLOS COMPUT BIOL J NEUROSCI--PLOS COMPUT BIOL
##  [9] J NEUROSCI--PLOS COMPUT BIOL J NEUROSCI--PLOS COMPUT BIOL J NEUROSCI--PLOS COMPUT BIOL J NEUROSCI--PLOS COMPUT BIOL
## [13] J NEUROSCI--PLOS COMPUT BIOL J NEUROSCI--PLOS COMPUT BIOL J NEUROSCI--PLOS COMPUT BIOL J NEUROSCI--PLOS COMPUT BIOL
## [17] J NEUROSCI--PLOS COMPUT BIOL J NEUROSCI--PLOS COMPUT BIOL J NEUROSCI--PLOS COMPUT BIOL J NEUROSCI--PLOS COMPUT BIOL
## [21] J NEUROSCI--PLOS COMPUT BIOL J NEUROSCI--PLOS COMPUT BIOL J NEUROSCI--PLOS COMPUT BIOL J NEUROSCI--PLOS COMPUT BIOL
## [25] J NEUROSCI--PLOS COMPUT BIOL J NEUROSCI--PLOS COMPUT BIOL J NEUROSCI--PLOS COMPUT BIOL J NEUROSCI--PLOS COMPUT BIOL
## + ... omitted several edges

Some summary statistics. I will only provide them here, but theur are availabel for all object created with biblioNetwork()

netstat <- networkStat(NetMatrix)
summary(netstat, k = 10)
## 
## 
## Main statistics about the network
## 
##  Size                                  7563 
##  Density                               0.012 
##  Transitivity                          0.274 
##  Diameter                              6 
##  Degree Centralization                 0.502 
##  Closeness Centralization              0.481 
##  Betweenness Centralization            0.154 
##  Eigenvector Centralization            0.946 
##  Average path length                   2.359 
##  
## 
## 
## 
## 
## 
## Main measures of centrality and prestige of vertices
## 
## 
## Degree Centrality: Top vertices
## 
##    Vertex ID              Degree Centrality
## 1     SCIENCE                         0.514
## 2     P NATL ACAD SCI USA             0.483
## 3     NATURE                          0.423
## 4     SOC NETWORKS                    0.340
## 5     SOCIAL NETWORK ANAL             0.323
## 6     AM J SOCIOL                     0.293
## 7     PLOS ONE                        0.249
## 8     PHYS REV E                      0.236
## 9     ANNU REV SOCIOL                 0.231
## 10    ADMIN SCI QUART                 0.222
## 
## 
## Closeness Centrality: Top vertices
## 
##    Vertex ID              Closeness Centrality
## 1     SCIENCE                            0.668
## 2     P NATL ACAD SCI USA                0.654
## 3     NATURE                             0.633
## 4     SOC NETWORKS                       0.598
## 5     SOCIAL NETWORK ANAL                0.591
## 6     AM J SOCIOL                        0.580
## 7     PLOS ONE                           0.568
## 8     PHYS REV E                         0.563
## 9     ANNU REV SOCIOL                    0.559
## 10    ADMIN SCI QUART                    0.551
## 
## 
## Eigenvector Centrality: Top vertices
## 
##    Vertex ID              Eigenvector Centrality
## 1     SCIENCE                              1.000
## 2     AM J SOCIOL                          0.968
## 3     P NATL ACAD SCI USA                  0.852
## 4     ANNU REV SOCIOL                      0.837
## 5     ADMIN SCI QUART                      0.827
## 6     NATURE                               0.797
## 7     ORGAN SCI                            0.784
## 8     SOC NETWORKS                         0.743
## 9     ACAD MANAGE REV                      0.738
## 10    ACAD MANAGE J                        0.727
## 
## 
## Betweenness Centrality: Top vertices
## 
##    Vertex ID              Betweenness Centrality
## 1     SCIENCE                             0.1540
## 2     P NATL ACAD SCI USA                 0.1191
## 3     NATURE                              0.0924
## 4     SOC NETWORKS                        0.0651
## 5     SOCIAL NETWORK ANAL                 0.0639
## 6     AM J SOCIOL                         0.0405
## 7     PLOS ONE                            0.0288
## 8     PHYS REV E                          0.0242
## 9     ANNU REV SOCIOL                     0.0225
## 10    ADMIN SCI QUART                     0.0171
## 
## 
## PageRank Score: Top vertices
## 
##    Vertex ID              Pagerank Score
## 1     SCIENCE                    0.00533
## 2     P NATL ACAD SCI USA        0.00522
## 3     NATURE                     0.00460
## 4     SOC NETWORKS               0.00345
## 5     SOCIAL NETWORK ANAL        0.00323
## 6     AM J SOCIOL                0.00266
## 7     PLOS ONE                   0.00255
## 8     PHYS REV E                 0.00249
## 9     BIOINFORMATICS             0.00216
## 10    ANNU REV SOCIOL            0.00202
## 
## 
## Hub Score: Top vertices
## 
##    Vertex ID              Hub Score
## 1     SCIENCE                 1.000
## 2     AM J SOCIOL             0.968
## 3     P NATL ACAD SCI USA     0.852
## 4     ANNU REV SOCIOL         0.837
## 5     ADMIN SCI QUART         0.827
## 6     NATURE                  0.797
## 7     ORGAN SCI               0.784
## 8     SOC NETWORKS            0.743
## 9     ACAD MANAGE REV         0.738
## 10    ACAD MANAGE J           0.727
## 
## 
## Authority Score: Top vertices
## 
##    Vertex ID              Authority Score
## 1     SCIENCE                       1.000
## 2     AM J SOCIOL                   0.968
## 3     P NATL ACAD SCI USA           0.852
## 4     ANNU REV SOCIOL               0.837
## 5     ADMIN SCI QUART               0.827
## 6     NATURE                        0.797
## 7     ORGAN SCI                     0.784
## 8     SOC NETWORKS                  0.743
## 9     ACAD MANAGE REV               0.738
## 10    ACAD MANAGE J                 0.727
## 
## 
## Overall Ranking: Top vertices
## 
##    Vertex ID              Overall Ranking
## 1     SCIENCE                           1
## 2     P NATL ACAD SCI USA               2
## 3     NATURE                            3
## 4     SOC NETWORKS                      4
## 5     AM J SOCIOL                       5
## 6     SOCIAL NETWORK ANAL               6
## 7     ANNU REV SOCIOL                   7
## 8     ADMIN SCI QUART                   8
## 9     PLOS ONE                          9
## 10    ORGAN SCI                        10

4.3.4.3 Historiograph - Direct citation linkages

We can also look at a histograph of ciation pattern over time.

histResults <- histNetwork(M, 
                           min.citations = quantile(M$TC,0.75), 
                           sep = ";")
## Articles analysed   100 
## Articles analysed   125
net <- histPlot(histResults, 
                n = 20, 
                size.cex=TRUE, 
                size = 5, 
                labelsize = 3, 
                arrowsize = 0.5)

## 
##  Legend
## 
##                                              Paper                                   DOI Year LCS  GCS
## 2008 - 1    LANGFELDER P, 2008, BMC BIOINFORMATICS               10.1186/1471-2105-9-559 2008  37 2152
## 2008 - 3         SUPEKAR K, 2008, PLOS COMPUT BIOL          10.1371/JOURNAL.PCBI.1000100 2008   9  539
## 2008 - 8         HORVATH S, 2008, PLOS COMPUT BIOL          10.1371/JOURNAL.PCBI.1000117 2008  15  299
## 2008 - 14              MILLER JA, 2008, J NEUROSCI        10.1523/JNEUROSCI.4098-07.2008 2008   8  224
## 2009 - 22      SMITH SM, 2009, P NATL ACAD SCI USA               10.1073/PNAS.0905267106 2009   3 2004
## 2009 - 23             BUCKNER RL, 2009, J NEUROSCI        10.1523/JNEUROSCI.5062-08.2009 2009   9 1274
## 2009 - 26               SUPEKAR K, 2009, PLOS BIOL          10.1371/JOURNAL.PBIO.1000157 2009   5  413
## 2009 - 27                     HE Y, 2009, PLOS ONE          10.1371/JOURNAL.PONE.0005226 2009   7  314
## 2009 - 35          PRELL C, 2009, SOC NATUR RESOUR             10.1080/08941920802199202 2009   3  231
## 2009 - 38                  KONOPKA G, 2009, NATURE                   10.1038/NATURE08549 2009   3  213
## 2009 - 44  BORGATTI SP, 2009, J SUPPLY CHAIN MANAG      10.1111/J.1745-493X.2009.03166.X 2009   3  185
## 2009 - 47         THEOCHARIDIS A, 2009, NAT PROTOC                10.1038/NPROT.2009.177 2009   3  171
## 2010 - 55              RUBINOV M, 2010, NEUROIMAGE      10.1016/J.NEUROIMAGE.2009.10.003 2010  18 2848
## 2010 - 59             HE Y, 2010, CURR OPIN NEUROL          10.1097/WCO.0B013E32833AA567 2010   4  287
## 2010 - 60        SKUDLARSKI P, 2010, BIOL PSYCHIAT        10.1016/J.BIOPSYCH.2010.03.035 2010   3  242
## 2010 - 66     MILLER JA, 2010, P NATL ACAD SCI USA               10.1073/PNAS.0914257107 2010  10  194
## 2011 - 75                 VOINEAGU I, 2011, NATURE                   10.1038/NATURE10110 2011   9  752
## 2011 - 83             BASSETT DS, 2011, NEUROIMAGE      10.1016/J.NEUROIMAGE.2010.09.006 2011   4  183
## 2012 - 90                 BARBERAN A, 2012, ISME J                10.1038/ISMEJ.2011.119 2012   4  383
## 2012 - 98             BASSETT DS, 2012, NEUROIMAGE      10.1016/J.NEUROIMAGE.2011.10.002 2012   3  166
## 2013 - 104  BORSBOOM D, 2013, ANNU REV CLIN PSYCHO 10.1146/ANNUREV-CLINPSY-050212-185608 2013   3  355
## 2013 - 107       BREUER K, 2013, NUCLEIC ACIDS RES                   10.1093/NAR/GKS1147 2013   3  216

4.3.5 The conceptual structure and context - Co-Word Analysis

Co-word networks show the conceptual structure, that uncovers links between concepts through term co-occurences.

Conceptual structure is often used to understand the topics covered by scholars (so-called research front) and identify what are the most important and the most recent issues.

Dividing the whole timespan in different timeslices and comparing the conceptual structures is useful to analyze the evolution of topics over time.

Bibliometrix is able to analyze keywords, but also the terms in the articles’ titles and abstracts. It does it using network analysis or correspondance analysis (CA) or multiple correspondance analysis (MCA). CA and MCA visualise the conceptual structure in a two-dimensional plot.

We can even do way more fancy stuff with abstracts or full texts (and do so). However, I dont want to spoiler Romans sessions, so I will hold myself back here

4.3.5.1 Co-word Analysis through Keyword co-occurrences

Plot options:

  • normalize = “association” (the vertex similarities are normalized using association strength)
  • n = 50 (the function plots the main 50 cited references)
  • type = “fruchterman” (the network layout is generated using the Fruchterman-Reingold Algorithm)
  • size.cex = TRUE (the size of the vertices is proportional to their degree)
  • size = 20 (the max size of the vertices)
  • remove.multiple=FALSE (multiple edges are not removed)
  • labelsize = 3 (defines the max size of vertex labels)
  • label.cex = TRUE (The vertex label sizes are proportional to their degree)
  • edgesize = 10 (The thickness of the edges is proportional to their strength. Edgesize defines the max value of the thickness)
  • label.n = 30 (Labels are plotted only for the main 30 vertices)
  • edges.min = 25 (plots only edges with a strength greater than or equal to 2)
  • all other arguments assume the default values
NetMatrix <- biblioNetwork(M, 
                           analysis = "co-occurrences", 
                           network = "keywords", 
                           sep = ";")

net <- networkPlot(NetMatrix, 
                   normalize = "association", 
                   n = 50, 
                   Title = "Keyword Co-occurrences", 
                   type = "fruchterman", 
                   size.cex = TRUE, size = 20, remove.multiple = FALSE, 
                   edgesize = 10, 
                   labelsize = 3,
                   label.cex = TRUE,
                   label.n = 50,
                   edges.min = 2)

4.3.5.2 Co-word Analysis through Correspondence Analysis

You already saw that comming, right?

CS <- conceptualStructure(M, 
                          method = "CA", 
                          field = "ID", 
                          minDegree = 10, 
                          k.max = 8, 
                          stemming = FALSE, 
                          labelsize = 8,
                          documents = 20)

4.3.5.3 Thematic Map

Co-word analysis draws clusters of keywords. They are considered as themes, whose density and centrality can be used in classifying themes and mapping in a two-dimensional diagram.

Thematic map is a very intuitive plot and we can analyze themes according to the quadrant in which they are placed: (1) upper-right quadrant: motor-themes; (2) lower-right quadrant: basic themes; (3) lower-left quadrant: emerging or disappearing themes; (4) upper-left quadrant: very specialized/niche themes.

Please see Cobo, M. J., López-Herrera, A. G., Herrera-Viedma, E., & Herrera, F. (2011). An approach for detecting, quantifying, and visualizing the evolution of a research field: A practical application to the fuzzy sets theory field. Journal of Informetrics, 5(1), 146-166.

NetMatrix <- biblioNetwork(M, 
                           analysis = "co-occurrences",
                           network = "keywords", 
                           sep = ";")

S <- normalizeSimilarity(NetMatrix, 
                         type = "association")

net <- networkPlot(S,
                   n = 500, 
                   Title = "Keyword co-occurrences",
                   type = "fruchterman",
                   labelsize = 2, 
                   halo = FALSE,
                   cluster = "walktrap", 
                   remove.isolates = FALSE,
                   remove.multiple = FALSE, 
                   noloops = TRUE, 
                   weighted = TRUE,
                   label.cex = TRUE,
                   edgesize = 5, 
                   size = 1,
                   edges.min = 2)

Map <- thematicMap(net, NetMatrix, 
                   S = S,
                   minfreq =5 )
plot(Map$map)

Lets inspect the clusters we found:

clusters <-Map$words %>%
  arrange(Cluster, desc(Occurrences))

clusters %>%
  select(Cluster, Words, Occurrences) %>%
  group_by(Cluster) %>%
  mutate(n.rel = Occurrences / sum(Occurrences) ) %>%
  slice(1:3)

4.3.6 The social structure - Collaboration Analysis

Collaboration networks show how authors, institutions (e.g. universities or departments) and countries relate to others in a specific field of research. For example, the first figure below is a co-author network. It discovers regular study groups, hidden groups of scholars, and pivotal authors. The second figure is called “Edu collaboration network” and uncovers relevant institutions in a specific research field and their relations.

4.3.6.1 Author collaboration network

NetMatrix <- biblioNetwork(M %>% filter(!grepl("GESCHWIND", AU)), 
                           analysis = "collaboration",  
                           network = "authors", 
                           sep = ";")

S <- normalizeSimilarity(NetMatrix, type = "jaccard")

net <- networkPlot(S,  
                   n = 50, 
                   Title = "Author collaboration",
                   type = "auto", 
                   size = 10,
                   weighted = TRUE,
                   remove.isolates = TRUE,
                   size.cex = TRUE,
                   edgesize = 1,
                   labelsize = 0.6)

4.3.6.2 Edu collaboration network

NetMatrix <- biblioNetwork(M, 
                           analysis = "collaboration",  
                           network = "universities", 
                           sep = ";")

net <- networkPlot(NetMatrix,  
                   n = 50, 
                   Title = "Edu collaboration",
                   type = "auto", 
                   size = 10,
                   size.cex = T,
                   edgesize = 3,
                   labelsize = 0.6)

4.3.6.3 Country collaboration network

M <- metaTagExtraction(M, 
                       Field = "AU_CO", 
                       sep = ";")

NetMatrix <- biblioNetwork(M, 
                           analysis = "collaboration",  
                           network = "countries", 
                           sep = ";")

net <- networkPlot(NetMatrix,  
                   n = dim(NetMatrix)[1], 
                   Title = "Country collaboration",
                   type = "sphere", 
                   cluster = "lovain",
                   weighted = TRUE,
                   size = 10,
                   size.cex = T,
                   edgesize = 1,
                   labelsize = 0.6)
## 
## Unknown cluster argument. Using default algorithm

Isn’t that all a lot of fun?

By now you should have realized that different leevel of projection and aggregation offer almost endless possibilities for analysis of ibliographic data!

4.4 Your turn

Finally, its again time for you to have some fun. Check out this final exercise: here.